OCPBUGS-88531: Propagate restart-date annotation to CNO operand pod templates#3030
OCPBUGS-88531: Propagate restart-date annotation to CNO operand pod templates#3030bryan-cox wants to merge 1 commit into
Conversation
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-88531, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a ChangesHyperShift restart-date annotation propagation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/hypershift/hypershift_test.go`:
- Around line 154-155: The test code is discarding error returns by using blank
identifiers (_) instead of capturing them, which can mask test failures and
produce misleading passes. Identify all instances in the test file where
function calls return errors that are being ignored (such as calls to
unstructured.NestedStringMap and similar utility functions), capture the
returned error values instead of discarding them with _, and add appropriate
assertions or error handling using the test assertion framework (e.g.,
g.Expect(err).NotTo(HaveOccurred())) to verify these operations succeed as
expected during test setup and assertions.
In `@pkg/hypershift/hypershift.go`:
- Line 291: The call to unstructured.NestedStringMap is discarding the error
return value (the third return value) by using a blank identifier, which means
any errors from malformed annotations are silently ignored. Instead of ignoring
the error, capture it as a named variable in the assignment to
unstructured.NestedStringMap, check if the error is non-nil, and handle it
appropriately before proceeding to mutate the pod-template annotations. This
ensures that rendering defects in the annotations are surfaced rather than
silently hidden.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 37272bb8-3d49-4c7b-9530-3c9966bb18b4
📒 Files selected for processing (3)
pkg/controller/operconfig/operconfig_controller.gopkg/hypershift/hypershift.gopkg/hypershift/hypershift_test.go
|
/jira refresh |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-88531, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-88531, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
|
||
| restartDate, _, err := unstructured.NestedString(hcp.UnstructuredContent(), "metadata", "annotations", RestartDateAnnotation) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to extract restart date annotation: %v", err) |
There was a problem hiding this comment.
Nit: It would be better to use %w to preserve the error chain but since all other calls in this file use %v let's keep it.
|
/lgtm |
kyrtapz
left a comment
There was a problem hiding this comment.
When the hypershift.openshift.io/restart-date annotation is set on a HostedControlPlane CR, CNO-managed operands like cloud-network-config-controller were not being restarted. This happened because CNO uses server-side apply with deterministic manifests — applying the same manifest twice is a no-op, so operands don't roll out unless something in the pod template changes.
While the change is mostly I don't get the reasoning from the PR description.
CNO never propagated this annotation to its operand pod templates, so there was nothing to trigger a rollout. This has nothing to do with SSA.
|
|
||
| // SetRestartDateAnnotation sets the restart-date annotation on pod templates of all | ||
| // Deployment, DaemonSet, and StatefulSet objects to trigger a rolling restart. | ||
| func SetRestartDateAnnotation(objs []*unstructured.Unstructured, restartDate string) error { |
There was a problem hiding this comment.
This will cause a rollout of ALL CNO managed Deployments, DaemonSets, and StatefulSets, even if they run only in the hosted cluster.
I was sure that this annotation only affects the control-plane components in hypershift, is that not the case?
There was a problem hiding this comment.
Good catch — SetRestartDateAnnotation currently operates on all rendered objects from Render(), which includes guest-cluster DaemonSets (ovnkube-node, multus, etc.), not just the HCP-namespace control-plane Deployments. The original CPO restart logic only targeted multus-admission-controller, network-node-identity, and ovnkube-control-plane in the HCP namespace.
I'll scope this by filtering on the HCP namespace so only control-plane objects are annotated.
| r.status.UnsetProgressing(statusmanager.OperatorRender) | ||
| } | ||
|
|
||
| if hcp := bootstrapResult.Infra.HostedControlPlane; hcp != nil && hcp.RestartDate != "" { |
There was a problem hiding this comment.
When the annotation is removed there will be another rollout, is this the expected behavior?
There was a problem hiding this comment.
Yes, removing the annotation would cause another rollout since the pod template spec changes (annotation removed). This is the expected behavior — removing the annotation is an explicit operator action, and the resulting rollout restores the clean state. I'll document this in the PR description.
|
@kyrtapz re: the PR description — you're right, the SSA framing is misleading. The actual issue is simply that CNO never propagated the |
…emplates When the hypershift.openshift.io/restart-date annotation is set on a HostedControlPlane, CNO now reads it and injects it into the pod template annotations of all rendered Deployments, DaemonSets, and StatefulSets. This triggers Kubernetes rolling restarts for all CNO operands, including cloud-network-config-controller which was previously missed. The fix follows the existing pattern of reading HCP annotations in ParseHostedControlPlane (like priority-class) and modifying rendered objects post-Render (like setOVNObjectAnnotation). This is the correct architectural location for this logic since CNO owns these operands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-88531, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, kyrtapz, mgencur The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
/retest-required |
|
/retest-required |
|
@bryan-cox: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
When the
hypershift.openshift.io/restart-dateannotation is set on a HostedControlPlane CR, CNO-managed operands likecloud-network-config-controllerwere not being restarted. CNO never propagated this annotation to its operand pod templates, so there was nothing to trigger a rollout.Previously, the CPO (control-plane-operator) handled restart-date propagation by directly patching individual deployments (multus-admission-controller, network-node-identity, ovnkube-control-plane), but this approach missed cloud-network-config-controller entirely — which is the bug reported in OCPBUGS-88531.
This PR fixes the issue by having CNO propagate the annotation itself:
hypershift.openshift.io/restart-dateannotation from the HostedControlPlane CR inParseHostedControlPlane()SetRestartDateAnnotation()helper that injects the restart-date into pod template annotations of rendered Deployments, DaemonSets, and StatefulSets within the HCP namespace only (guest-cluster workloads are not affected)Render()returns, gated on HyperShift mode and a non-empty restart-date valueThe companion HyperShift PR (#8751) removes the redundant CPO-side restart logic, since CNO now handles it directly.
Annotation removal behavior
If the
restart-dateannotation is removed from the HostedControlPlane, CNO will render manifests without the annotation on pod templates. This changes the pod template spec, triggering another rollout that restores the pods to a clean state (no restart-date annotation). This is expected behavior — removing the annotation is an explicit operator action.Test plan
ParseHostedControlPlanewith restart-date annotationSetRestartDateAnnotationcovering Deployments, DaemonSets, StatefulSets, non-apps objects, namespace filtering, annotation preservation, and multi-object handlinggo build ./...passesgo test ./pkg/... ./cmd/...passesgo vetpasseshypershift.openshift.io/restart-dateannotation on a HostedCluster and verified all 4 CNO operands restart (cloud-network-config-controller, multus-admission-controller, network-node-identity, ovnkube-control-plane). Full test verification report.Fixes: https://issues.redhat.com/browse/OCPBUGS-88531